专利摘要:
method, system, client and server to map file. the present invention relates to a method, a system, a client and a server for mapping a file. the method comprises: enumerating files to be mapped (s110); get an attribute value of a file to be mapped from the files enumerated one by one and pass the attribute value to a server end (s130); compare the attribute value with resource codes stored at the server end and acquire a resource code consistent with the attribute value and a category to which the resource code belongs (s150); and form a match between a file to be mapped, an attribute value and a category according to the resource code consistent with the attribute value and the category to which the resource code belongs and record the match in a result of the first mapping (s170). the method, system, client and server to map a file can pass an attribute value of a file to be mapped to the server end and compare the attribute value with resource codes stored at the server end and categories of this to deploy risk identification and file security. the server end advancing from the client storage limitation can store a large number of feature codes and the server end can update the feature codes quickly and in time, so the feature codes at the server end are comprehensive; thereby improving file mapping efficiency.
公开号:BR112014002425B1
申请号:R112014002425-1
申请日:2012-07-09
公开日:2021-06-01
发明作者:Shuhui MEI;Anwu Liang
申请人:Tencent Technology (Shenzhen) Company Limited;
IPC主号:
专利说明:

FIELD OF THE INVENTION
[001] The present invention relates, in general, to the field of data processing technology and, more particularly, to a method to scan file, client and server thereof. BACKGROUND OF THE INVENTION
[002] With the development of computer technology, people enjoy work and entertainment through various files. The files that people use may have been downloaded from the internet or obtained through a portable storage medium, or they may have been received through interconnecting with other people. Consequently, from the users aspect, the high possibility lies in the fact that files obtained through various forms, as well as within terminal devices such as computers and mobile phones, could be suspicious. Furthermore, great damage would be caused to the files that people use from through overloading the virus or Trojan horse programs contained in the suspicious files.
[003] However, only the scanning engines from the local clients and the local virus library are used in scanning suspicious files. Client engines are anti-virus engines while virus signature database with local virus library is limited compared to the fact that the number of virus and trojan resources is rapidly growing even faster than the speed of updating the local virus library, thus, the local virus library could only improve the update frequency very passively.
[004] Due to the incompetence that the virus signature database of the local virus library might not cover all virus and trojan resources, the low efficiency lies in using the client engine to scan suspicious files . SUMMARY OF THE INVENTION
[005] Consequently, it is necessary to provide a method for scanning files that could improve scanning efficiency.
[006] In addition, it is necessary to provide a system for scanning files that could improve scanning efficiency.
[007] In addition, it is necessary to provide a client to scan files that could improve scanning efficiency.
[008] In addition, it is necessary to provide a server to scan files which could improve scanning efficiency.
[009] A method for scanning files includes: enumerating unscanned files; obtaining assignments of unscanned files from the enumerated files one by one and transmitting the assignments to a server; comparing assignments with resources that are stored on the server, obtaining the resources that are consistent with the assignments and types the resources belong to; egenerate a mapping relationship between unscanned files, assignments, and types according to the resources that are consistent with the resource assignments and types, and record the mapping relationship in a result of the first scan.
[0010] A method of scanning files includes: enumerating unscanned files; obtaining assignments of unscanned files from the enumerated files one by one and transmitting the assignments to a server.
[0011] A method for scanning files includes: comparing assignments with resources that are stored on a server, getting the resources that are consistent with the assignments and types that the resources belong to; generate a mapping relationship between unscanned files, the assignments and types according to the resources that are consistent with the assignments and types of the resources, and record the mapping relationship in a result of the first scan.
[0012] A system for scanning files includes a client and a server; the client includes: an enumeration module for enumerating files; a get assignment module for obtaining assignments of unscanned files one by one and transmitting the assignments to the server; the server includes: a database to store the resources and resource types; a comparison module to compare the assignment with the resources that are stored and get those resources that are consistent with the assignments and the type that the resources belong to; a mapping relationship module to generate a mapping relationship between unscanned files, assignments and types according to the resources that are consistent with the assignments and resource types and recording the mapping relationship in a result of the first scan.
[0013] A client to scan files includes: an enumeration module to enumerate files; an assignment get module to get assignments of unscanned files one by one and pass the assignments to the server.
[0014] A server for scanning files includes: a database for storing the resources and types of resources; a comparison module to compare the assignment with the resources that are stored and get those resources that are consistent with the assignments and the type that the resources belong to; a mapping relationship module to generate a mapping relationship between non-files scanned, the assignments and types according to the resources that are consistent with the assignments and resource types and record the mapping relationship in a result of the first scan.
[0015] The above method and system for scanning files and the client and server upload file assignments to the server; and create the security and dangerous recognition of the file by comparing the corresponding resources and types. Since the server would break the storage volume limit by storing large amount of resources, and the server could update resources quickly and at the right time, the resources stored on the server would be relatively competent; in this way, the efficiency of scanning files could be improved. BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a flowchart of files according to one embodiment; Figure 2 is a flowchart of files according to another embodiment; Figure 3 is a flowchart of files according to another embodiment; Figure 4 is a flowchart of files according to another embodiment; Figure 5 is a flowchart of the method for locally scanning unscanned files according to a result of the first scan of Figure 4; Figure 6 is a block diagram of a system for scan files according to an embodiment; Figure 7 is a block diagram of a client according to an embodiment; Figure 8 is a block diagram of the file determination module of Figure 7; Figure 9 is a customer block diagram according to another embodiment; Figure 10 is a customer block diagram according to another embodiment. Figure 11 is a customer block diagram according to another embodiment. DESCRIPTION OF PREFERRED MODALITIES
[0016] Figure 1 is a flowchart of a method for scanning files according to an exemplary embodiment, which includes as follows.
[0017] Step S110, enumerate unscanned files.
[0018] Under the current modality, when launching the scanning engine of anti-virus software or anti-Trojans software, users create a scan request via the scan engine's scan page and send the scan request. scan created to the lower system hardware via an IPC (Inter-Process Communication) module and additionally send the scan request to the server via the lower system hardware. The scanner and server get the unscanned files through the incoming scan request and thereby load the targeted scan on the files as per the scan request. The IPC module is established between the scanner's scanning page and the lower hardware to conduct communication between the scanner's page and the lower hardware and further making the scanner and server interconnected.
[0019] In detail, the scan request includes a task ID, scan layer and the way to enumerate the file folder; where the scan layer is related to the options the user chooses to perform a quick scan, a full scan, or a custom scan. For example, like in fast scan mode, the scan speed is quite high while the scan layer is relatively shallow.
[0020] To conduct scanning of files, unscanned files would be obtained according to user operation on scanning page. The designated files would be the unscanned files and then are enumerated in accordance with an established queue length, allotted to be enumerated queues with particular length for the scan. In a preferred mode, the file length is 20,000.
[0021] Step S130, obtain assignments from the unscanned files one by one and transmit the assignments to a server.
[0022] Under the current modality, unscanned file assignments are obtained to identify unscanned files uniquely and could be used to ensure the integrity of unscanned files. In a preferred modality, the unscanned file assignments could be the MD5 value.
[0023] The assignment of each unscanned file of the multiple enumerated unscanned files is obtained one by one to generate a query request containing information such as the assignment, the filename of the unscanned file; the generated query request is then sent to the server. The server could be a cloud platform built by multiple servers within whose cloud platform the number of servers could be added or deducted in reference to demand or large scale server groups.
[0024] After triggering the server scan on files, if the unscanned enumerated files to be scanned by the server are missing such that no unscanned files are found to be scanned by the server, a predetermined time is established and must be waited before retrying. The predetermined time could be 100 ms.
[0025] Step S150, compare assignments with resources that are stored on the server, get the resources that are consistent with the assignments and types that the resources belong to.
[0026] In the current modality, assignments can be MD5 values or Hash values after conducting calculations on unscanned files, where each assignment corresponds to an unscanned file uniquely. In the case where the unscanned file has no integrity, the corresponding assignment would be different from that of the embedded unscanned file. The large number of resources and the belonging types are stored on the server. Resources and stored types are correlated with each other, meaning that each resource has the corresponding type. The same is checked on the server according to the assignments of the unscanned files, to find those resources that are consistent with the assignments and additionally to find the type that the resources belong to according to the correlation between the resources and the corresponding types, in that the types found are the types of the unscanned file assignments that indicate that the unscanned files are normal files, virus files, or Trojans. For example, the file types that the resources of files with viruses belong to are blacklists, those files with blacklist types are files with viruses or Trojans; while the types that normal file resources belong to are whitelisting, those files with the whitelisting types are determined to be safe files that do not contain viruses or Trojans and are trusted to run. For suspicious files whose types are greylisted; although these files with the greylist type could not be determined as virus files or Trojans but are active in the virus sensitive parts of the system.
[0027] Those resources that are consistent with the assignments of the unscanned files are found by comparing the assignments with the resources that are stored on the server and additionally the corresponding types are determined through the resources that are consistent with the assignments. The types indicate which files match the assignment are files with viruses, Trojans, or normal files or suspicious files. In the case where the resource consistent in accordance with the assignment is not stored on the server, it means that large amount of resources stored on the server is not occurring, while the files with such assignments are sorted in an undefined list.
[0028] Step S170, generate a mapping relationship between unscanned files, assignments and types according to the resources that are consistent with the resource assignments and types, and record the mapping relationship in a first scan result .
[0029] According to the current modality, the file type could be determined through the comparison process between the assignment and the resource and additionally a result of the first scan is completed and sent to the user.
[0030] The above steps S150 and S170 are performed on the server, to conduct the file scan on the server.
[0031] According to another embodiment, as in Figure 2, the following steps are included after step S170 as above.
[0032] Step S210, locally determine the unscanned files for a local scan according to the result of the first scan.
[0033] In accordance with the current modality, based on scanning through the server, files could also follow through a local scan with the scanning engine. To improve efficiency and accuracy, local file scanning should be done in proper combination with server-side file scanning.
[0034] In detail, it can be determined from the result of the first scan that these suspicious files and those files with assignments are returned from the server that corresponding resources could not be found on the server, in which, to ensure the accuracy of the result of scanning, it is necessary to determine the suspicious files and the files with the undefined list to be the files being scanned for the local scan.
[0035] In addition, it is necessary to conduct a local scan on those files that have been scanned on the server, thereby ensuring that all files have been scanned and obtained matching the scan result.
[0036] After triggering the local scan, if files not scanned for the local scan are not found, a predetermined time is established must be waited before retrying. The predetermined time could be 100 ms.
[0037] Step S230, conduct a local scan on the unscanned files, thereby obtaining a result of the second scan.
[0038] According to the current modality, the assignments are obtained from the unscanned files and the local virus library is checked to find the resources that are consistent with the assignments and the corresponding type according to the obtained assignment, and Also, if the files are normal files, files with viruses or Trojans could be determined by the type found.
[0039] Step S250, integrating the second and the result of the first scan, to generate a result of the third scan.
[0040] According to the current modality, after finishing the server scan and the local scan, a result of the third scan could be determined by integrating the result of the first scan and the result of the second scan in proper combination.

[0041] In detail, as in the table above, for a given file, if the result of the first scan determines that the file type is blacklisted, the result of the third scan of it would be consistent with the result of the first scan; if the result of the first scan determines that the file type is greylisted or undefined, it is necessary to obtain a result of the second scan through a local scan and the result of the third scan of the same would be determined consistent with the result of the second scan .
[0042] According to the above scanning method, the result of the third scan could be illustrated to the user after being generated; user reminders should be proposed regarding the risk of the blacklist files according to the result of the third scan and additionally a clean operation could be done on these blacklist files.
[0043] In accordance with another embodiment, referring to Figure 3, the following steps may be included after step S250 above.
[0044] Step S310, remove the unscanned files corresponding to the files in the third scan result from the enumerated queue.
[0045] In the present mode, after scanning the server and scanning the files locally, those files that have been scanned from the enumerated queues must be removed; which means the files are routed according to the file options in the result of the third scan and then removed from the multiple enumerated scan files.
[0046] Step S330, determining if there is blank space in the enumerated queue, step S350 proceeds if there is blank space, while coming to an end if there is not.
[0047] According to the present modality, not all unscanned files are in the enumerated queue as the multiple numbered files must form the enumerated queue with a particular length, therefore, it is necessary to find the white space in the enumerated queue so that those files that were not added to the enumerated queue could be added to the enumerated queue.
[0048] In detail, after removing the files that were scanned, the unscanned files in the enumerated queue would remain in their original positions as long as they would not be moved or adjusted even if particular files were removed. For example, if a file in the first position in the enumerated queue is removed from the enumerated queue after being scanned, the file in the second position would not move forward to supplement the white space in the first position. Consequently, an enumeration pointer in the enumerated queue would start looking for the first position white space, while a white space is found, an unscanned file that is not included in the enumerated queue and waiting to be scanned would be added to the enumerated queue. In the case that white space is found, a continued search is made to add the unscanned file to the enumerated queue.
[0049] Step S350, add unscanned files that are not included in the enumerated queue to the enumerated queue.
[0050] According to another embodiment, referring to Figure 4, the method for scanning files includes steps as follows.
[0051] Step S401, enumerate unscanned files.
[0052] Step S402, determine whether the length of the unscanned enumerated files has reached a first threshold; step S403 is continued if reached or step S401 is rolled back.
[0053] According to the present embodiment, during the process of enumerating unscanned files to generate the enumerated queue, the length of the generated enumerated queue is determined to check whether a first threshold has been reached; the server is triggered to scan the files if it is reached. In a preferred modality, the first threshold could be 50, which means the server would be triggered to scan files when 50 unscanned files were enumerated.
[0054] Step S403, search the unscanned enumerated files for those unscanned files that meet the pre-established conditions.
[0055] In the present mode, after triggering the server to scan the files, those unscanned files that could be scanned through the server are searched in the multiple listed files. In a preferred modality, the pre-established condition could be PE files (portable execution) with the size of less than 3M. Pre-set conditions could be modified according to actual processing capacity and user demands.
[0056] Step S405, get the assignments one by one from the unscanned enumerated files and send the assignments to the server.
[0057] Step S406, compare the assignments with the resources stored on the server and get resources that are consistent with the assignments and types of the resources.
[0058] Step S407, generate a mapping relationship of unscanned files, assignments and types according to resources that are consistent with the assignments and types of resources, and record the mapping relationship in a result of the first scan.
[0059] Step S408, determine whether the length of the unscanned enumerated files has reached a second threshold; step S409 is continued if it is reached or step S401 is rolled back.
[0060] According to the present embodiment, with the process of enumerating unscanned files to generate the enumerated queue, the length of the enumerated queue is determined to check whether a second pre-set threshold is reached; local scan is triggered if it is reached. In accordance with a preferred modality, the second threshold could be 5,000, which means that the local scan would be triggered after the enumerated queue length reached 5,000.
[0061] In a preferred modality, the second threshold should be greater than the first threshold, due to the fact that file scanning on the server would require relatively more time-consuming network connection and data transmission as compared to local scanning . Furthermore, resources stored on the server would be better suited, this would improve the accuracy of the scan if the final scan result is completed based on the first scan result that is generated through the server scan, while the total scan time would be saved as such.
[0062] Step S409, determine the files for local scanning according to the result of the first scan.
[0063] In a detailed mode, the step of marking the unscanned files that transmitted the assignments is added after step S405 above.
[0064] In the modality, after the file assignments are transmitted to the server, mark the files that are scanned through the server.
[0065] Referring to Figure 5, the details of step S409 above include as follows.
[0066] Step S501, determine the files to be scanned second among the enumerated files according to the result of the first scan.
[0067] In the current mode, the files and the corresponding file type are determined through the result of the first scan, if the file type registered in the result of the first scan is graylisted or undefined, which indicates that the files can be suspicious files or whose assignments might not have found matching resources stored on the server that are consistent with the server and thus might not be determined as they need to be scanned second.
[0068] Step S503, choose the unmarked unscanned files from the unscanned enumerated files by determining the files to be scanned second and the unmarked unscanned files to be the files for local scanning.
[0069] In the present mode, those files within the multiple enumerated unscanned files that were not scanned on the server must also be scanned through the local scanning engine.
[0070] Step S410, scan the determined files for local scanning with the local scanner and generate a result of the second scan.
[0071] In the present mode, the detailed process to scan in a localized way the determined files not scanned and generate the result of the second scan would be: successively scan the files for the second scan and the unmarked files according to the pre-priority. established.
[0072] During the process to successively scan files according to the pre-set priority, files for second scan are scanned locally first, after the scan on files for second scan is finished, files that are not files PE are scanned and finally PE files that do not qualify for the pre-set conditions are scanned. The priority for scanning can be adjusted accordingly.
[0073] Step S411, integrate the second scan result with the first scan result to generate a third scan result.
[0074] Step S412, get the type of files for the second scan in the result of the second scan.
[0075] According to the present modality, since the relationship between file names, assignments and types is registered in the result of the second scan, the type of files that can be found from the result of the second scan in this way went through the second scan; and consequently it could thus be determined whether the files for the second scan would be the virus file or the Trojans according to the type.
[0076] Step S413, determine the risk of the file for the second scan according to type, step S414 is continued if it is dangerous or step S415 is continued if it is not.
[0077] In the present mode, it can be determined whether the files for the second scan are dangerous according to the corresponding type. For example, if the file type is blacklisted, it is indicated that the file for the second scan contains viruses or Trojans, thus being dangerous. Since the dangerous file type is determined through local scanning, it is indicated that the resources stored on the server are inadequate as they need to be updated; while the assignment of the file with the type determined in the second scan is stored as a resource.
[0078] Step S414, upload the file assignment for the second scan.
[0079] Step S415, scan the file for the second scan to determine a matching suspect index.
[0080] Under the present modality, when it is determined that the file type for the second scan is not dangerous, the file for the second scan may be a suspicious file, while it is necessary to scan that file to determine its corresponding suspicious index .
[0081] Step S416, determining if the suspect index has exceeded a suspect threshold, step S414 is backtracked if it has exceeded or comes to an end if there is not.
[0082] According to this modality, it is possible to determine the possibility of security of the suspicious file according to the pre-established suspicious threshold. For example, the suspect threshold is set to be 30%; if the suspect index exceeds 30%, the suspect file must be determined to be a virus file or a Trojan horse file; as long as the suspicious file resource is not stored on the server, it is necessary to upload the suspicious file resource to the server and blacklist.
[0083] Figure 6 is a block diagram of a system for scanning files according to a modality, which system includes a client 10 and a server 30.
[0084] The client 10 includes an enumeration module 110 and an assignment obtainment module 120.
[0085] Enumeration module 110 is used to enumerate files.
[0086] Under the current modality, when launching the scanning engine of antivirus software or anti-Trojans software, users create a scan request via the scan engine's scan page and send the var- scan created to the lower system hardware via an IPC (Inter-Process Communication) module and additionally send the scan request to the server via the lower system hardware. The scanner and the server get the unscanned files through the incoming scan request and thereby port targeted scanning on the files according to the scan request. The IPC module is established between the scanner's scanning page and the lower hardware to conduct a communication between the scanner's page and the lower hardware and further make the scanner and server interconnected.
[0087] In detail, the scan request includes a task ID, scan layer and mode to enumerate the file folder; where the scan layer is related to the options the user chooses to perform a quick scan, a full scan, or a custom scan. For example, like fast, the scan speed is quite high while the scan layer is relatively shallow.
[0088] To conduct scanning of files, unscanned files would be obtained according to user operation on scan page. The designated files would be the unscanned files and then are enumerated in accordance with an established queue length, allotted to be enumerated queues with particular length for the scan. In a preferred mode, the file length is 20,000.
[0089] The obtain assignment module 120 is used to obtain the assignment of the unscanned files one by one and transmit the assignments to the server.
[0090] According to the current modality, the assignment obtaining module 120 obtains the assignments of the unscanned files to identify the unscanned files only, whose assignments could be used to ensure the integrity of the unscanned files. In a preferred modality, the assignments of the unscanned files could be of MD5 value.
[0091] The assignment obtainment module 120 obtains the assignment of each unscanned file from the multiple unscanned files numbered one by one, to generate a query request that contains information such as the assignment, the filename of the file not swept; the generated query request is then sent to the server. The server could be a cloud platform built by multiple servers within whose cloud platform the number of servers could be added or deducted in reference to demand or large scale server groups.
[0092] After triggering the server scan on files, if the unscanned enumerated files to be scanned by the server are missing such that no unscanned files are found to be scanned by the server, a predetermined time is established and should be expected before retrying. The predetermined time could be 100 ms.
[0093] The server 30 includes a database 310, a comparison module 320 and a relationship mapping module 330.
[0094] Database 310 is used to store the resources and resource types.
[0095] The comparison module 320 is used to compare the assignment with the resources that are stored and get those resources that are consistent with the assignments and the type that the resources belong to.
[0096] In the current modality, the assignments can be MD5 values or Hash values after completing the calculations in the unscanned files, where each assignment corresponds to a uniquely unscanned file. In the case where the unscanned file has no integrity, the corresponding assignment would be different than that of the embedded unscanned file. The large number of resources and the belonging types are stored on the server. The comparison module 320 checks the server according to the assignments of the unscanned files to find those resources that are consistent with the assignments and additionally to find the type that the resources belong to according to the correlation between the resources and the corresponding types , where found types are the types of unscanned file assignments that indicate that the unscanned files are normal files, files with viruses, or Trojans. For example, the file types that the resources of files with viruses belong to are blacklists, those files with the types of blacklists are files with viruses or Trojans; while the types that normal file resources belong to are whitelisting, these files with whitelisting types are determined to be safe files that do not contain viruses or Trojans and are trusted to run. For suspicious files whose types are greylisted; whereas these files with the greylist type could not be determined as virus files or Trojans, but are active in the virus sensitive parts of the system.
[0097] The comparison module 320 finds those resources that are consistent with the assignments of the unscanned files between comparing the assignments with the resources that are stored on the server and additionally the corresponding types are determined through the resources that are consistent with the assignments . Types indicate that the files matching the assignment are files with viruses, Trojans, or normal files or suspicious files. In the case where the resource consistent in accordance with the assignment is not stored on the server, it means that the large amount of resources stored on the server does not occur, while the files with such assignments are sorted in an undefined list.
[0098] The mapping relationship module 330 is used to generate a mapping relationship between unscanned files, assignments and types according to resources that are consistent with the assignments and types of the resources and record the relationship of mapping on a result of the first scan.
[0099] According to the current modality, the mapping relation module 330 determines the file type through the process of comparing the assignment and the resource and additionally a result of the first scan is completed and sent to the user.
[00100] According to another embodiment, referring to Figure 7, the client 10 includes, in addition to the enumeration module 110 and the assignment obtaining module 120, a file determination module 130, a scanning module 140 and a result integration module 150.
[00101] The file determination module 130 is used to locally determine unscanned files for the local scan according to the result of the first scan.
[00102] In accordance with the current modality, based on scanning through the server, files could also go through a local scan with the scanner. To improve efficiency and accuracy, local file scanning should be done in proper combination with server-side file scanning.
[00103] In detail, the file determination module 130 can determine from the result of the first scan that is returned from the server those suspicious files and those files with assignments that the corresponding resources could not be found on the server, where, to ensure the accuracy of the scan result, it is necessary to determine the suspicious files and the files with the undefined list to be the files being scanned for the local scan.
[00104] In addition, it is necessary for the file determination module 130 to allow a local scan on those files that have not yet been scanned on the server, thereby ensuring that all files are scanned and get a corresponding scan result.
[00105] After triggering the local scan, if files not scanned for the local scan are not found, a predetermined time is set to wait before retrying. The predetermined time could be 100 ms.
[00106] In a detailed embodiment, the above client also includes a marking module; the tagging module is used to tag those files that have passed the assignments.
[00107] In this modality, the marking module marks the files that are scanned through the server after the assignments are transmitted to the server.
[00108] Referring to Figure 8, the file determination module 130 includes a second scan unit 131 and a selection unit 133.
[00109] Second scan unit 131 is used to determine the files to be scanned second among files enumerated according to the result of the first scan.
[00110] In the current mode, the second scan unit 131 determines the files and the corresponding file type through the result of the first scan, if the file type registered in the result of the first scan is graylisted or undefined, the which indicates that the files might be suspicious files or whose assignments might not be found corresponding resources stored on the server that are consistent with them and thus might not be determined by needing to be scanned second.
[00111] The unit of selection 133 is used to choose the unmarked unscanned files from the unscanned enumerated files, determine the files to be scanned second, and the unmarked unscanned files to be the files for local scanning.
[00112] In the present modality, those files within the multiple enumerated unscanned files that were not scanned on the server must also be scanned through the local scan engine.
[00113] Scan module 140 is used to scan the files determined for local scanning with the local scanner and generate a second scan result.
[00114] Under the current modality, the scanning module 140 obtains the assignments from the unscanned files and checks the local virus library to find the resources that are consistent with the assignments and the corresponding type according to the assignment obtained, and, furthermore, if the files are normal files, files with viruses or Trojans could be determined by the type found.
[00115] In detail, the scan module 140 is also used to successively scan the files for the second scan and the unmarked files according to the pre-set priority.
[00116] During the process to successively scan the files according to the pre-set priority, the scan module 140 locally scans the files for second scan first, after the scan on files for second scan is finished, the files that are not PE files are scanned and finally PE files that do not qualify for the pre-set conditions are scanned. The priority for scanning can be adjusted accordingly.
[00117] Result Integration Module 150 is used to integrate the second scan result with the first scan result to generate a third scan result.
[00118] According to the current modality, after the server scan and the local scan are finished, a result of the third scan could be determined by integrating the result of the first scan and the result of the second scan with proper combination by result integration module 150.
[00119] According to another embodiment, referring to Figure 9, the above client 10 also includes a removal module 160 and an addition module 170.
[00120] Purge module 160 is used to remove unscanned files corresponding to files in the third scan result of the enumerated queue.
[00121] In the present mode, after scanning the server and scanning the files locally, the removal module 160 must remove those files that have been scanned from the enumerated queues; which means that files are routed according to the file options in the third scan result and then removed from the multiple enumerated scan files.
[00122] Add module 170 is used to determine if there is white space in the enumerated queue by adding unscanned files that are not included in the enumerated queue in the enumerated queue if there is white space.
[00123] According to the present modality, not all unscanned files are within the enumerated queue as multiple numbered files must form the enumerated queue with a particular length, therefore, it is necessary for the addition module 170 to find the space blank in the enumerated queue so that those files that were not included in the enumerated queue could be added to the enumerated queue.
[00124] In detail, after removing the files that were scanned, the unscanned files in the enumerated queue would remain in their original positions when not being removed or adjusted even if particular files were removed. For example, if a file in the first position in the enumerated queue is removed from the enumerated queue after being scanned, the file in the second position would not move forward to supplement the white space in the first position. Consequently, an enumeration pointer on the enumerated queue would start searching for the first position white space while a white space is found; the add module 170 would add an unscanned file that is not included in the enumerated queue and would expect to be scanned in the enumerated queue. In case blank space is found, a continued search is done to add the unscanned file to the enumerated queue.
[00125] According to another embodiment, referring to Figure 10, the client 10 also includes an enumeration determination module 180 and a search module 190.
[00126] Enumeration determination module 180 is used to determine whether the length of the unscanned enumerated files has reached a first threshold; and inform the search module 190 if it is reached.
[00127] According to the present embodiment, during the process of enumerating the unscanned files to generate the enumerated queue, the enumeration determination module 180 determines the length of the generated enumerated queue to check whether a first threshold has been reached; the server is triggered to scan the files if it is reached. In a preferred modality, the first threshold could be 50, which means that the server would be triggered to scan files when 50 unscanned files are enumerated.
[00128] The search module 190 is used to search the unscanned enumerated files for those unscanned files that meet the pre-set conditions.
[00129] In the present mode, after triggering the server to scan the files, the search module 190 searches in the multiple files enumerated by those unscanned files that could be scanned through the server. In a preferred modality, the pre-established condition could be PE files (portable execution) with the size of less than 3M. Pre-set conditions could be modified according to actual processing capacity and user demands.
[00130] Enumeration determination module 180 is also used to determine if the length of the unscanned enumerated files has reached a second threshold and to inform file determination module 130 if it is reached.
[00131] According to the present embodiment, with the process of enumerating the files not scanned to generate the enumerated queue, the enumeration determination module 180 determines the length of the enumerated queue to check whether a second pre-set threshold is Reached; local scan is triggered if it is reached. In accordance with a preferred modality, the second threshold could be 5,000, which means that the local scan would be triggered after the enumerated queue length reaches 5,000.
[00132] In a preferred modality, the second threshold should be greater than the first threshold, due to the fact that file scanning on the server would require relatively more time-consuming network connection and data transmission as compared to local scanning . Furthermore, resources stored on the server would be better suited, this would improve the accuracy of the scan if the final scan result is completed based on the first scan result that is generated through the server scan, while the total scan time would be saved as such.
[00133] According to another embodiment, referring to Figure 11, the client 10 includes a type 200 procurement module, a risk determination module 210, an upload transfer module 230 and a suspect index determination module 240.
[00134] Type get module 200 is used to get the type of files for the second scan in the result of the second scan.
[00135] According to the present modality, since the relationship between file names, assignments and types is registered in the result of the second scan, the obtaining module of type 200 could thus find from the result of the second scan the type of files that passed the second scan; and consequently the files for the second scan could be that way, determined to be the virus file or the Trojans according to the type.
[00136] Risk determination module 210 is used to determine the file risk for the second scan according to type; inform the upload module 230 if it is dangerous or inform the scan module 140 if it is not.
[00137] In the present mode, the risk determination module 210 can determine if the files for the second scan are dangerous according to the corresponding type. For example, if their type is blacklisted, it is indicated that the file for the second scan contains viruses or Trojans, and is therefore dangerous. Since the dangerous file type is determined through local scanning, it is indicated that the resources stored on the server are inadequate when they need to be updated; while the assignment of the file with the type determined in the second scan is stored as a resource.
[00138] Upload module 230 is used to upload the file assignment to the second scan.
[00139] Scan module 140 is also used to scan the file for the second scan to determine a corresponding suspicious index.
[00140] Under the present modality, when it is determined that the file type for the second scan is not dangerous, the file for the second scan may be a suspicious file, while scanning module 140 is required to scan that file to determine its corresponding suspect index.
[00141] Suspect index determination module 240 is used to determine whether the suspect index has exceeded a suspect threshold; and inform the upload transfer module 230 if it has exceeded.
[00142] According to this modality, it is possible to determine the possibility of security of the suspicious file according to the pre-established suspicious threshold. For example, the suspect threshold is set to be 30%; if it is determined by the suspect index determination module 240 that the suspect index exceeds 30%, the suspect file must be determined to be a virus file or a Trojan horse file; as long as the suspicious file resource is not on the server, it is necessary to upload the suspicious file resource to the server and classify it as blacklist.
[00143] The above method and system for scanning files and the client and uploading from the server the file assignments to the server; and create security and dangerous recognition of the file by comparing the corresponding resources and types. Since the server would break the storage volume limit by storing large amount of resources and the server could update resources quickly and at the right time, the resources stored on the server would be relatively competent; in this way the efficiency of scanning files could be improved.
[00144] The above method, system, client and server for scanning files integrate scanning on files through local scanning and server scanning of resource comparison, while scanning accuracy is improved.
[00145] The above method, system, client, and server to scan files upload dangerous files or second-scan files with suspicious index that exceed the suspicious threshold, while continuously updating and supplementing the resources stored on the server, thereby improving the efficiency of scanning files.
[00146] The above-described embodiments only explain various exemplary embodiments of the present disclosure. It should be mentioned that, for persons versed in the art, alternative modalities could be made to which the present disclosure belongs without departing from its spirit and scope, whereby alternative modalities should be defined as within the claim of the present disclosure.
权利要求:
Claims (7)
[0001]
1. Method for scanning files comprising the steps of:- enumerating (110, 403) unscanned files; - obtaining (130, 405) unscanned file assignments from the enumerated files one by one and transmitting to one ser- vidor (30) assignments; - compare (150, 406) assignments with resources that are stored on a server (30), get resources that are consistent with the assignments and types that the resources belong to; - generate (170 , 407) a mapping relationship between unscanned files, assignments and types according to the resources that are consistent with the assignments and resource types and recording the mapping relationship in a first scan result ;characterized by the fact that it further comprises: - determining (210, 409) in a localized way the unscanned files for a local scan according to the result of the first scan; - performing (230) a local scan on the unscanned files , thus obtaining a second result of sweep; and - integrating (250, 411) the second and first scan result to generate a third scan result.
[0002]
2. Method for scanning files according to claim 1, characterized in that, after integrating (250, 411) the second scan result with the first scan result to generate a third scan result, the method further comprises:- removing (310) the unscanned files corresponding to the files in the third scan result of an enumerated queue.
[0003]
3. Method for scanning files according to claim 2, characterized in that after removing (310) the unscanned files corresponding to the files in the third scan result from a queue enumerated, the method further comprises:- determining (330) whether there is white space in the enumerated row; and add, to the enumerated queue, unscanned files that are not included in the enumerated queue, if there is white space.
[0004]
4. A method for scanning files according to any one of claims 1 to 3, characterized in that, before obtaining (130, 405) assignments of unscanned files from the enumerated files one by one, the method still comprises:- determining (402) whether a length of the enumerated unscanned files has reached a first threshold;- searching (403) in the unscanned files enumerated by unscanned files meeting pre-defined conditions if the length reaches the first threshold; and obtaining unscanned file assignments from the enumerated files one by one; wherein, before determining (210, 409) locally unscanned files for a local scan according to the first scan result, the method comprises further:- determine (408) whether the length of the enumerated unscanned files has reached a second threshold; and determining locally unscanned files for a local scan according to the first scan result if the length reaches the second threshold.
[0005]
5. Method for scanning files, according to any one of claims 1 to 4, characterized in that, after transmitting the assignments to a server (30), the method further comprises: - marking the unscanned files that transmitted the attributes -buitions;wherein determining locally unscanned files for a local scan according to the first scan result comprises:- determining the files to be scanned second among the enumerated files according to the first scan result;- choose untagged unscanned files from the enumerated unscanned files, determining the files to be scanned second and the untagged unscanned files to be the files for local scanning.
[0006]
6. Method for scanning files according to claim 5, characterized by the fact that locally scanning unscanned files and obtaining a second scan result comprises:- performing a successive scan of the files for a second scan and the unmarked files according to a pre-defined priority.
[0007]
7. Method for scanning files according to any one of claims 1 to 6, characterized in that, after integrating (250, 411) the second scan result and the first scan result, to generate the third scan result the method further comprises: - obtaining (412) the file type for the second scan in the second scan result; - determining (413) a file risk for the second scan according to the type; and upload (414) the file assignment for the second scan if the file is at risk; - scan (415) the file for the second scan to determine a matching suspicious index if the file is not risky; - determine (416) whether the suspicious index has exceeded a suspicious threshold; and upload the file assignment for the second scan if the suspect index exceeds the suspect threshold.
类似技术:
公开号 | 公开日 | 专利标题
BR112014002425B1|2021-06-01|METHOD FOR SCAN FILES
JP5816198B2|2015-11-18|System and method for sharing the results of computing operations between related computing systems
US9015214B2|2015-04-21|Process of generating a list of files added, changed, or deleted of a file server
JP5976020B2|2016-08-23|System and method for performing anti-malware metadata lookup
US10452691B2|2019-10-22|Method and apparatus for generating search results using inverted index
US8732587B2|2014-05-20|Systems and methods for displaying trustworthiness classifications for files as visually overlaid icons
US8176555B1|2012-05-08|Systems and methods for detecting malicious processes by analyzing process names and process characteristics
AU2018355092B2|2021-02-11|Witness blocks in blockchain applications
RU2012156434A|2014-06-27|SYSTEM AND METHOD FOR SELECTING AN OPTIMAL TYPE OF ANTI-VIRUS SCAN WHEN ACCESSING A FILE
US20140122509A1|2014-05-01|System, method, and computer program product for performing a string search
RU2634178C1|2017-10-24|Method of detecting harmful composite files
CN108875364B|2020-06-26|Threat determination method and device for unknown file, electronic device and storage medium
CN107451152B|2021-06-11|Computing device, data caching and searching method and device
CN108228799B|2021-09-28|Object index information storage method and device
CN109165222A|2019-01-08|A kind of HBase secondary index creation method and system based on coprocessor
US20160203149A1|2016-07-14|File scanning method and apparatus related application
CN103995863B|2018-06-19|A kind of method and device of data de-duplication
CN109918488A|2019-06-21|Method and apparatus for similar document retrieval
CN103827854A|2014-05-28|Search method and information management device
RU2614561C1|2017-03-28|System and method of similar files determining
US10313366B1|2019-06-04|Retroactive identification of previously unknown malware based on network traffic analysis from a sandbox environment
US20210360001A1|2021-11-18|Cluster-based near-duplicate document detection
US9483560B2|2016-11-01|Data analysis control
US10606844B1|2020-03-31|Method and apparatus for identifying legitimate files using partial hash based cloud reputation
US9875248B2|2018-01-23|System and method for identifying a file path using tree data structure
同族专利:
公开号 | 公开日
CN102915421B|2013-10-23|
RU2014102898A|2015-09-10|
US9069956B2|2015-06-30|
RU2581560C2|2016-04-20|
CN102915421A|2013-02-06|
US20140157408A1|2014-06-05|
EP2741227B1|2017-01-04|
EP2741227A1|2014-06-11|
WO2013017004A1|2013-02-07|
BR112014002425A2|2017-02-21|
EP2741227A4|2015-07-22|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US6021510A|1997-11-24|2000-02-01|Symantec Corporation|Antivirus accelerator|
US6931548B2|2001-01-25|2005-08-16|Dell Products L.P.|System and method for limiting use of a software program with another software program|
US7062490B2|2001-03-26|2006-06-13|Microsoft Corporation|Serverless distributed file system|
US6993132B2|2002-12-03|2006-01-31|Matsushita Electric Industrial Co., Ltd.|System and method for reducing fraud in a digital cable network|
US7257842B2|2003-07-21|2007-08-14|Mcafee, Inc.|Pre-approval of computer files during a malware detection|
US7475427B2|2003-12-12|2009-01-06|International Business Machines Corporation|Apparatus, methods and computer programs for identifying or managing vulnerabilities within a data processing network|
EP1549012A1|2003-12-24|2005-06-29|DataCenterTechnologies N.V.|Method and system for identifying the content of files in a network|
US7971257B2|2006-08-03|2011-06-28|Symantec Corporation|Obtaining network origins of potential software threats|
CN101127061B|2006-08-16|2010-05-26|珠海金山软件股份有限公司|Device preventing and treating computer virus capable of pre-estimating schedule and schedule pre-estimation method|
CN101039177A|2007-04-27|2007-09-19|珠海金山软件股份有限公司|Apparatus and method for on-line searching virus|
CN101621511A|2009-06-09|2010-01-06|北京安天电子设备有限公司|Multilayer detecting method without local virus library and multilayer detecting system|
RU2420791C1|2009-10-01|2011-06-10|ЗАО "Лаборатория Касперского"|Method of associating previously unknown file with collection of files depending on degree of similarity|
CN101795267B|2009-12-30|2012-12-19|成都市华为赛门铁克科技有限公司|Method and device for detecting viruses and gateway equipment|
CN101827096B|2010-04-09|2012-09-05|潘燕辉|Cloud computing-based multi-user collaborative safety protection system and method|
CN101808102B|2010-04-23|2012-12-12|潘燕辉|Operating record tracing system and method based on cloud computing|
RU103201U1|2010-11-01|2011-03-27|Закрытое акционерное общество "Лаборатория Касперского"|SYSTEM OF OPTIMIZATION OF USE OF COMPUTER RESOURCES DURING ANTI-VIRUS SCAN|
CN102024113B|2010-12-22|2012-08-01|北京安天电子设备有限公司|Method and system for quickly detecting malicious code|CN103248666A|2012-02-14|2013-08-14|深圳市腾讯计算机系统有限公司|System, method and device for offline resource download|
CN102799811B|2012-06-26|2014-04-16|腾讯科技(深圳)有限公司|Scanning method and device|
CN103390130B|2013-07-18|2017-04-05|北京奇虎科技有限公司|Based on the method for the rogue program killing of cloud security, device and server|
CN103605743A|2013-11-20|2014-02-26|中国科学院深圳先进技术研究院|Method and device for deleting mobile terminal empty folders|
EP3111613B1|2014-02-28|2018-04-11|British Telecommunications public limited company|Malicious encrypted traffic inhibitor|
US9383989B1|2014-06-16|2016-07-05|Symantec Corporation|Systems and methods for updating applications|
CN104268288B|2014-10-21|2018-06-19|福州瑞芯微电子股份有限公司|A kind of media library scan method and device based on NTFS|
WO2016107754A1|2014-12-30|2016-07-07|British Telecommunications Public Limited Company|Malware detection|
US10733295B2|2014-12-30|2020-08-04|British Telecommunications Public Limited Company|Malware detection in migrated virtual machines|
US10075453B2|2015-03-31|2018-09-11|Juniper Networks, Inc.|Detecting suspicious files resident on a network|
WO2017109129A1|2015-12-24|2017-06-29|British Telecommunications Public Limited Company|Software security|
US11201876B2|2015-12-24|2021-12-14|British Telecommunications Public Limited Company|Malicious software identification|
WO2017109135A1|2015-12-24|2017-06-29|British Telecommunications Public Limited Company|Malicious network traffic identification|
US10839077B2|2015-12-24|2020-11-17|British Telecommunications Public Limited Company|Detecting malicious software|
WO2017108576A1|2015-12-24|2017-06-29|British Telecommunications Public Limited Company|Malicious software identification|
WO2017167544A1|2016-03-30|2017-10-05|British Telecommunications Public Limited Company|Detecting computer security threats|
WO2017167545A1|2016-03-30|2017-10-05|British Telecommunications Public Limited Company|Network traffic threat identification|
US10771483B2|2016-12-30|2020-09-08|British Telecommunications Public Limited Company|Identifying an attacked computing device|
CN108881120B|2017-05-12|2020-12-04|创新先进技术有限公司|Data processing method and device based on block chain|
EP3623980B1|2018-09-12|2021-04-28|British Telecommunications public limited company|Ransomware encryption algorithm determination|
法律状态:
2018-12-11| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2019-11-05| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2021-04-20| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2021-06-01| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 09/07/2012, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
CN201110222738.9|2011-08-04|
CN2011102227389A|CN102915421B|2011-08-04|2011-08-04|Method and system for scanning files|
PCT/CN2012/078387|WO2013017004A1|2011-08-04|2012-07-09|Method, system, client and server for scanning file|
[返回顶部]